R-CNN minus R
نویسندگان
چکیده
Deep convolutional neural networks (CNNs) have had a major impact in most areas of image understanding, including object category detection. In object detection, methods such as R-CNN have obtained excellent results by integrating CNNs with region proposal generation algorithms such as selective search. In this paper, we investigate the role of proposal generation in CNN-based detectors in order to determine whether it is a necessary modelling component, carrying essential geometric information not contained in the CNN, or whether it is merely a way of accelerating detection. We do so by designing and evaluating a detector that uses a trivial region generation scheme, constant for each image. Combined with SPP, this results in an excellent and fast detector that does not require to process an image with algorithms other than the CNN itself. We also streamline and simplify the training of CNN-based detectors by integrating several learning steps in a single algorithm, as well as by proposing a number of improvements that accelerate detection. Object detection is one of the core problems in image understanding. Until recently, the best performing detectors in standard benchmarks such as PASCAL VOC were based on a combination of handcrafted image representations such as SIFT, HOG, and the Fisher Vector and a form of structured output regression, from sliding window to deformable parts models. Recently, however, these pipelines have been outperformed significantly by the ones based on deep learning that acquire representations automatically from data using Convolutional Neural Networks (CNNs). Currently, the best CNN-based detectors are based on the R-CNN construction of [3]. Conceptually, R-CNN is remarkably simple: it samples image regions using a proposal mechanism such as Selective Search (SS; [6]) and classifies them as foreground and background using a CNN. The first question that we address here is whether CNN contain sufficient geometric information to localise objects, or whether the latter must be supplemented by an external mechanism, such as region proposal generation. There are in fact two hypothesis. The first one is that the only role of proposal generation is to cut down computation by allowing to evaluate the CNN, which is expensive, on a small number of image regions. The second hypothesis is that, instead, proposal generation provides geometric information essential for accurate object localisation which is not represented in the CNN. This is not unlikely, given that CNNs are often trained to be highly invariant to even large geometric deformations and hence may not be sensitive to an object’s location. The second question is whether the R-CNN pipeline can be simplified. While conceptually straightforward, in fact, R-CNN comprises many practical steps that need to be carefully implemented and tuned to obtain a good performance. R-CNN builds on a CNN pre-trained on an image classification tasks such as ImageNet ILSVRC [1], such as the AlexNet network [5]. This CNN is ported to detection by: i) learning an SVM classifier for each object class on top of the last fully-connected layer of the network, ii) fine-tuning the CNN on the task of discriminating objects and background, and iii) learning a bounding box regressor for each object class. We simplify these steps, which require running a mix of different software on cached data, by training a single CNN addressing all required tasks. The third question is whether R-CNN can be accelerated. A substantial speedup was already obtained in spatial pyramid pooling (SPP) by [4] by realising that convolutional features can be shared among different regions rather than being recomputed. However, this does not accelerate training, and in testing the region proposal generation mechanism becomes the new bottleneck. Our first improvement is that we are able to skip the SVM training step, which involves hard negative mining. Furthermore, at the test time the whole detector, including detection of multiple object classes and bounding box regression, reduces to evaluating a single CNN (implemented in MatConvNet [7]) which already brings a significant speedup shown in Table 1. The Table 1 shows that for the SPP detector the main bottleneck is bounding box generation. We show, that picking a constant set of boxes Impl. [ms] SelS Prep. Move Conv SPP FC BBR Σ−SelS SPP MS 1.98 ·103 23.3 67.5 186.6 211.1 91.0 39.8 619.2 ±118.0 OURS 23.7 17.7 179.4 38.9 87.9 9.8 357.4 ±34.3 SPP SS 9.0 47.7 31.1 207.1 90.4 39.9 425.1 ±117.0 OURS 9.0 3.0 30.3 19.4 88.0 9.8 159.5 ±31.5
منابع مشابه
Prostate segmentation and lesions classification in CT images using Mask R-CNN
Purpose: Non-cancerous prostate lesions such as prostate calcification, prostate enlargement, and prostate inflammation cause too many problems for men’s health. This research proposes a novel approach, a combination of image processing techniques and deep learning methods for classification and segmentation of the prostate in CT-scan images by considering the experienced physicians’ reports. ...
متن کاملDeep learning-based CAD systems for mammography: A review article
Breast cancer is one of the most common types of cancer in women. Screening mammography is a low‑dose X‑ray examination of breasts, which is conducted to detect breast cancer at early stages when the cancerous tumor is too small to be felt as a lump. Screening mammography is conducted for women with no symptoms of breast cancer, for early detection of cancer when the cancer is most treatable an...
متن کاملرابطه بین ادراک از فضای سیاسی سازمان با تمایل به ترک شغلی، عملکرد شغلی و رفتار شهروندی سازمانی: آزمون میانجی عدالت سازمانی
Objective Political behavior in organizations is influenced by the differences in perceptions and attitudes of the staff, nature of the action, or people's perception of reality. Such behavior stems from the perception and reaction to self-interest. Different studies have showed that this behavior is an inevitable part of any human activity. Staff when asked about political behavior in the wo...
متن کاملFace R-CNN
Faster R-CNN is one of the most representative and successful methods for object detection, and has been becoming increasingly popular in various objection detection applications. In this report, we propose a robust deep face detection approach based on Faster R-CNN. In our approach, we exploit several new techniques including new multi-task loss function design, online hard example mining, and...
متن کاملLearning Oriented Region-based Convolutional Neural Networks for Building Detection in Satellite Remote Sensing Images
The automated building detection in aerial images is a fundamental problem encountered in aerial and satellite images analysis. Recently, thanks to the advances in feature descriptions, Region-based CNN model (R-CNN) for object detection is receiving an increasing attention. Despite the excellent performance in object detection, it is problematic to directly leverage the features of R-CNN model...
متن کامل